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Abstract. Although Extract Method is a key refactoring for improving program 
comprehension, refactoring tools for such purpose are often underused. To ad¬ 
dress this shortcoming, we present JExtract, a recommendation system based 
on structural similarity that identifies Extract Method refactoring opportuni¬ 
ties that are directly automated by IDE-based refactoring tools. Our evaluation 
suggests that JExtract is more effective (w.r.t. recall and precision) to identify 
contiguous misplaced code in methods than JDeodorant, a state-of-the-art tool. 

Tool demonstration video. http://youtu.be/6htJOzXwRNA 

1. Introduction 

Refaetoring has inereased in importanee as a teehnique for improving the design of 
existing eode [|2l, e.g., to inerease eohesion, deerease eoupling, foster maintainability, 
ete. Partieularly, Extraet Method is a key refaetoring for improving program eomprehen- 
sion. Besides promoting reuse and redueing eode duplieation, it eontributes to readability 
and eomprehensibility, by eneouraging the extraetion of self-doeumenting methods B . 

Nevertheless, reeent empirieal researeh indieate that, while Extraet Method is one 
of the most eommon refaetorings, automated tools supporting this refaetoring are most of 
the times underused flU |4| . Eor example, Negara et al. found that Extraet Method is the 
third most frequent refaetoring, but the number of developers who apply the refaetoring 
manually is higher than the number of those who do it automatieally [I5]|. Moreover, 
eurrent tools foeus only on automating refaetoring applieation, but developers expend 
eonsiderable effort on the manual identifieation of refaetoring opportunities. 

To address this shorteoming, this paper presents JExtract, a tool that implements 
a novel approaeh for reeommending automated Extraet Method refaetorings. The tool was 
designed as a plug-in for the Eelipse IDE that automatieally identifies, ranks, and applies 
the refaetoring when requested. Thereupon, JExtract may aid developers to find refae- 
toring opportunities and eontribute to a widespread adoption of refaetoring praetiees. The 
underlying teehnique is inspired by the separation of concerns design guideline. More 
speeifieally, we assume that the structural dependencies established by Extract Method 
candidates should be very different from the ones established by the remaining statements 
in the original method. 

The remainder of this paper is struetured as follows. Seetion deseribes the 
JExtract tool, including its design and implementation. Sectionj^discusses related tools 
and Section [^presents final remarks. 



2. The JExtract tool 


JExtract is a tool that analyzes the souree eode of methods and reeommends Extraet 
Method refaetoring opportunities, as illustrated in Figure First, the tool generates all 
Fxtraet Method possibilities for eaeh method. Seeond, these possibilities are ranked ae- 
eording to a seoring funetion based on the similarity between sets of dependeneies estab¬ 
lished in the eode. 


JExtract 



Figure 1. The JExtract tool 


This main seetion of the paper is organized as follows. Subseetion |2.1 [ provides an 
overview of our approaeh for identifying Fxtraet Method refaetoring opportunities. Sub- 
seetion 2.2 deseribes the design and implementation of the tool. Finally, Subseetion |2.3 


presents the results of our evaluation in open-souree systems. A detailed deseription of the 
reeommendation teehnique behind JFxtraet is present in a reeent full teehnieal paper [|9l . 


2.1. Proposed Approach 

The approaeh is divided in three phases: Generation of Candidates, Scoring, and Ranking. 


2.1.1. Generation of candidates 


This phase is responsible for identifying all possible Fxtraet Method refaetoring opportu¬ 
nities. First, we split the methods into bloeks, whieh eonsist of sequential statements that 
follow a linear eontrol flow. As an example. Figure presents method mouseRelease 
of elass SelectionClassif ierBox, extraeted from ArgoUMF. We ean notiee that eaeh 
statement is labeled using the SX.Y pattern, where X and Y denote the bloek and the state¬ 
ment, respeetively. For example, S2.3 is the third statement of the seeond bloek, whieh 
deelares a variable cw. 



public void mouseReleased(MouseEvent me) { 


Sl.l 

for (Button btn : buttons) { 



S2.1 

int cx = btn.fig.getX() + btn.fig.getWidth() - btn.icon.getIconWidth(); 



S2.2 

int cy = btn.fig.getY (); 



S2.3 

int cw = btn.icon.getIconWidth0; 



S2.4 

int ch = btn.icon.getIconHeight(); 



S2.5 

Rectangle rect = new Rectangle(cx, cy, cw, ch); 



S2.6 

if (rect. contains (me . getX 0 , me.getYO)) { 




S3.1 

Object metaType = btn.metaType; 




S3.2 

FigClassifierBox fcb = (FigClassifierBox) getContent(); 




S3.3 

FigCompartment fc = fcb.getCompartment(metaType); 




S3.4 

fc.setEditOnRedraw(true); 




S3.5 

fc.createModelElement(); 




S3.6 

me.consume(); 




S3.7 

return; 




} 




) 




SI.2 

super 

.mouseReleased(me); 


} 




Figure 2. An Extract Method candidate in a method of ArgoUML (S3.2 to S3.5) 






























































Second, we generate all Extract Method candidates based on Algorithm (ex¬ 
tracted from [[9ll). 

Algorithm 1 Candidates generation algorithm [Q 

Input: A method M 

Output: List with Extract Method candidates 
1: Candidates <—9 
2: for all block B & M do 
3: n <— statements(B) 

4; for i -f— 1, n do 

5; for j <— i,n do 

6; C ^ subset(B,i,j) 

7; if isValid{C) then 

8; Candidates <— Candidates + C 

9: end if 

10; end for 

11: end for 

12: end for 

Fundamentally, the three nested loops in Algorithm (lines 2, 4, and 5) enforce 
that the list of selected statements attend the following preconditions: 

• Only continuous statements inside a block are selected. In Figure]^ for example, 
it is not possible to select a candidate with S3.2 and S3.4 without including S3.3. 

• The selected statements are part of a single block of statements. In Figure]^ for 
example, it is not possible to generate a candidate with both S2.6 and S3.1 since 
they belong to distinct blocks. 

• When a statement is selected, the respective children statements are also included. 
In Figure for example, when statement S2.6 is selected, its children statements 
S3.1 to S3.7 are also included. 

Fast but not least, we do not ensure that every iteration of the loop yields an 
Extract Method candidate because: (i) a candidate recommendation must respect a size 
threshold defined by parameter Minimum Extracted Statements. The value is preset to 
3 (changeable), which means that an Extract Method candidate has to have at least three 
statements; and (ii) a candidate recommendation must respect the preconditions defined 
by the Extract Method refactoring engine. 


2.1.2. Scoring 

This phase is responsible for scoring the possible Extract Method refactoring opportu¬ 
nities generated in the previous phase, using a technique inspired by a Move Method 
recommendation heuristic 0. Assume m' as the selection of statements of an Extract 
Method candidate and m" the remaining statements in the original method m. The pro¬ 
posed heuristic aims to minimize the structural similarity between m! and m". 

Structural Dependencies: The set of dependencies established by a selection of state¬ 
ments S with variables, types, and packages is denoted by Dep.^^.^{S), Dep^yp^{S), and 
Deppacki^), respectively. These sets are constructed as described next. 

• Variables: If a statement s from a selection of statements S declares, assigns, or 
reads a variable v, then v is added to Z)ep.y^^(S'). In a similar way, reads from and 
writes to formal parameters and fields are considered. 






• Types: If a statement s from a seleetion of statements S uses a type (elass or 
interfaee) T, then T is added to Dep^yp^{S). 

• Packages: For eaeh type T ineluded in Dep^yy^{S), as deseribed in the previous 
item, the paekage where T is implemented and all its parent paekages are also 
ineluded in Depy^^^{S). 

For instanee, assume m' as the highlighted eode in Figure (i.e., an 
Extraet Method eandidate) and m” the remaining statements in the original 
method mouseReleased. On one hand, = {metaType, f c, f cb}. On the 

other hand, the set Dep^^^{m") = {metaType, btn, cy, cx, cw, ch, buttons, me, rect}. 
In this ease, the interseetion between these two sets eontains only metaType. Moreover, 
the eomputation of f c and f cb is isolated from the remaining eode. Therefore, one ean 
elaim that m' is eohesive and deeoupled from m", i.e., a good separation of eoneerns is 
aehieved. 


Scoring Function: To compute the dissimilarity between m' and m” , we rely on the 
distance between the dependency sets Dep' and Dep" using the Kulczynski similarity 
coefficient [fTOl lTlI: 


dist{Dep, Dep") 


1 

2 L(a + 6) 


+ 


a 

(a + c). 


where a = \Dep' Dep"\, b = \Dep' \ Dep"\, and c = \Dep" \ Dep' 


Thus, let m! be the selection of statements of an Extract Method candidate for 
method m. Eet also m" be the remaining statements in m. The score of m' is defined as: 

score{m') = 1/3 x dist{Dep^^^{m'), Dep^^^{m")) + 

1/3 X dist{Dep^yy^{m'), Deptyp^im")) + 

1/3 X d%st{Depp^^ki^'), Depp^^kim")) 

The scoring function is centered on the observation that a good Extract Method 
candidate should encapsulate the use of variables, types, and packages. In other words, 
we should maximize the distance between the dependency sets Dep' and Dep". 


2.1.3. Ranking 

This phase is responsible for ranking and filtering the Extract Method candidates based 
on the score computed in the previous phase. Basically, we sort the candidates and filter 
them according to the following parameters: (i) Maximum Recommendations per Method. 
The value is preset to 3 (changeable), which means that the tool triggers up to three recom¬ 
mendations for each method; and (ii) Minimum Score Value, which has to be configured 
when the user desires to setup a minimum dissimilarity threshold. 

2.2. Internal Architecture and Interface 

We implemented JExtract as a plug-in on top of the Eclipse platform. Therefore, we 
rely mainly on native Eclipse APIs, such as Java Development Tools (JDT) and Eanguage 







Toolkit (LTK). The current JExtract implementation follows an architecture with five 
main modules: 


1. Code Analyzer: This module provides the following services to other modules: 

(a) it builds the structure of block and statements (refer to Subsection |2.1.1| ); 

(b) it extracts the structural dependencies (refer to Subsection |2.1.2| ); and (c) it 
checks if an Extract Method candidate satisfies the underlying Eclipse Extract 
Method refactoring preconditions. In fact, this module contains most communi¬ 
cation between JExtract and Eclipse APIs (e.g., org.eclipse.jdt.core and 
org. eel ipse. Itk. core, ref act or ing). 


2. Candidate Generator: This module generates all Extract Method candidates 
based on Algorithmic and hence depends on service (a) of module Code Analyzer. 


3. Scorer: This module calculates the dissimilarity of the Extract Method candidates 
generated by module Candidate Generator (refer to Subsection 2.1.2i and hence 
depends on service (b) of module Code Analyzer. 


4. Ranker: This module ranks and filters the Extract Method candidates generated 
by module Candidate Generator and scored by module Scorer. It depends on ser¬ 
vice (c) of module Code Analyzer to filter candidates not satisfying preconditions. 


5. UI: This module consists of the front-end of the tool, which relies on the 
Eclipse UI API (org.eclipse.ui) to implement two menu extensions, six 
actions, and one main view. Moreover, it depends on module UI from ETK 
(org.eclipse.Itk.ui.ref actoring) to delegate the refactoring appliance to the 
underlying Eclipse Extract Method refactoring tool. 


Such architecture permits the extension of our tool. Eor example, the Scorer mod¬ 
ule may be replaced by one that employs a new heuristic based on semantic and structural 
information. As another example, the Candidate Generator module may be extended to 
support the identification of non-contiguous code fragments. 

Eigure [^presents JExtract’s UI, displaying method mouseReleased previously 
presented in Eigure When a developer triggers JExtract to identify Extract Method 
refactoring opportunities for this method, it opens the Extract Method Recommendations 
view to report the potential recommendations. In this case, the best candidate consists of 
the extraction of statements S3.2 to S3.5 whose dissimilarity score is 0.7148. 

2.3. Evaluation 

We conducted two different but complementary empirical studies. 

Study #1: In our previous paper dUl, we evaluated the recommendations provided by our 
tool on three systems to assess precision and recall. We extended this study to consider 
minor modifications to the ranking method and to compare the results with JDeodorant, a 
state-of-the-art tool that identifies Extract Method opportunities ffTTl . Eor each system S, 
we apply random Inline Method refactoring operations to obtain a modified version S'. 






Figure 3. JExtract Ul 

We assume that good Extraet Method opportunities are the ones that revert the modifiea- 
tions (i.e., restoring S from S'). 


Table 1. Study #1 - Recall and precision results 


System 

# 

Top-1 

Recall Free. 

JExtract 

Top-2 

Recall Free. 

Top-3 

Recall Free. 

JDeodorant 

Recall Free. 

JHotDraw 5.2 

56 

19 (34%) 

34% 

26 (46%) 

24% 

32 (57%) 

20% 

2 (4%) 

5% 

JUnit 3.8 

25 

13 (52%) 

52% 

16 (64%) 

33% 

18 (72%) 

25% 

0 (0%) 

0% 

MyWebMarket 

14 

12 (86%) 

86% 

14 (100%) 

50% 

14 (100%) 

33% 

2 (14%) 

33% 

Total 

95 

44 ( 46 %) 

46% 

56 ( 59 %) 

30% 

64 ( 67 %) 

23% 

4 ( 4 %) 

6% 


Table reports reeall and preeision values aehieved using JExtract with three 
different configurations {Top-k Recommendations per Method). While a high parameter value 
favors recall (e.g., Top-3), a low one favors precision (e.g., Top-1). Table[^also presents re¬ 
sults achieved using JDeodorant with its default settings. As the main finding, JExtract 
outperforms JDeodorant regardless of the configuration used. 

Study #2: We replicate the previous study in other ten popular open-source Java systems 
to assess how the precision and recall rates would vary. Nevertheless, we do not compare 
our results with JDeodorant since we were not able to reliably provide the source code 
of all required libraries, as demanded by JDeodorant. 

Table reports the recall and precision values achieved using the same settings 
from the previous study. On one hand, the overall recall value ranges from 25% to 49.2%. 
On the other hand, the overall precision value ranges from 25% to 16.7%. We argue these 
values are acceptable for two reasons: (i) we only consider as correct a recommendation 
that matches exactly the one at the oracle; thus, a slight difference of including (or exclud¬ 
ing) a statement is enough to be considered a miss; and (ii) the modified methods may 
have preexisting Extract Method opportunities, besides the ones we introduced, that will 
be considered wrong by our oracle. 








































Table 2. Study #2 - Recall and precision results 


System 

# 

Top-1 

Recall 

Free. 

JExtract 

Top-2 

Recall Free. 

Top-3 

Recall 

Free. 

Ant 1.8.2 

964 

235 (24.4%) 

24.4% 

363 (37.7%) 

19.1% 

460 (47.7%) 

16.3% 

ArgoUML 0.34 

439 

98 (22.3%) 

22.3% 

160 (36.4%) 

18.3% 

186 (42.4%) 

14.4% 

Checkstyle 5.6 

533 

227 (42.6%) 

42.6% 

3 3 8 (63.4%) 

31.9% 

389 (73.0%) 

24.7% 

FindBugs 1.3.9 

714 

179 (25.1%) 

25.1% 

278 (38.9%) 

19.7% 

350 (49.0%) 

16.7% 

FreeMind 0.9.0 

348 

85 (24.4%) 

24.4% 

134 (38.5%) 

19.4% 

181 (52.0%) 

17.8% 

JFreeChart 1.0.13 

1,090 

204 (18.7%) 

18.7% 

396 (36.3%) 

18.2% 

536 (49.2%) 

16.5% 

JUnit4.10 

35 

11 (31.4%) 

32.4% 

17 (48.6%) 

26.6% 

22 (62.9%) 

23.7% 

Quartz 1.8.3 

239 

99 (41.4%) 

41.4% 

125 (52.3%) 

26.5% 

142 (59.4%) 

20.4% 

SQuirreLSQL3.1.2 39 

15 (38.5%) 

38.5% 

18 (46.2%) 

23.7% 

20 (51.3%) 

18.2% 

Tomcat 7.0.2 

1,076 

214 (19.9%) 

19.9% 

325 (30.2%) 

15.2% 

409 (38.0%) 

12.8% 

Total 

5,477 

1,367 ( 25 . 0 %) 

25.0% 

2,154 ( 39 . 3 %) 

19.8% 

2,695 ( 49 . 2 %) 

16.7% 


3. Related Tools 

Recent empirical research shows that automated refactoring tools, especially those sup¬ 
porting Extract Method refactorings, are most of the times underused ||51|4l. In view of 
such circumstances, recent studies on identification of refactoring opportunities are seek¬ 
ing to address this shortcoming. In this paper, we implemented our approach in a way 
that it can be straightforwardly incorporated to the current development process through 
a tool that identifies, ranks, and automate Extract Method refactoring opportunities [HI. 

JMove is the refactoring recommendation system our approach is inspired by ||71 
[6l|. The tool identifies Move Method refactoring opportunities based on the similarity 
between dependency sets [|71 . More specifically, it computes the similarity of the set of 
dependencies established by a given method m with (i) the methods of its own class Ci 
and (ii) the methods in other classes of the system {C 2 , C 3 ,..., C^). Whereas JMove recom¬ 
mends moving a method m to a more similar class Ci, our current approach recommends 
extracting a fragment from a given method m into a new method m' when there is a high 
dissimilarity between m! and the remainder statements in m. 

JDeodorant is the state-of-the-art system to identify and apply common refactor¬ 
ing operations in Java systems, including Extract Method ifTTll . In contrast to our approach 
that relies on the similarity between dependency sets, JDeodorant relies on the concept 
of program slicing to select related statements that can be extracted into a new method. 
Our approach, on the other hand, is not based on specific code patterns (such as a com¬ 
putation slice). It is also more conservative to preserve program behavior (although it 
is currently restricted to non-contiguous fragments of code), and it relies on a scoring 
function to rank and filter recommendations. 

There are other techniques to identify refactoring opportunities based, for exam¬ 
ple, on search-based algorithms Relational Topic Model (RTM) [|T1, metrics-based 
rules [l3]|, etc., that can be adapted to identify Extract Method refactoring opportunities. 

4. Final Remarks 

JExtract implements a novel approach for recommending automated Extract Method 
refactorings. The tool was designed as a plug-in for the Eclipse IDE that automatically 









identifies, ranks, and applies the refaetoring. Thereupon, the tool may eontribute to in- 
erease the popularity of IDE-based refaetoring tools, whieh are normally eonsidered un¬ 
derused by most reeent empirieal studies on refaetoring. Moreover, our evaluation indi- 
eates that JExtract is more effeetive (w.r.t. reeall and preeision) to identify eontiguous 
misplaeed eode in methods than JDeodorant, a state-of-the-art tool. 

As ongoing work, we are extending JExtract to be able to do statement reorder¬ 
ing to uneover better Extraet Method opportunities, as long as the modifieation preserves 
the behavior of the original eode. Moreover, we intend to evaluate our tool with human 
experts to mitigate the threat that the synthesized datasets did not eapture the full spee- 
trum of Extraet Method instanees faeed by developers. East, we also intend to support 
other kinds of refaetoring (e.g.. Move Method). 

The JExtract tool—including its source code—is publicly available at 
http: / / aserg.labsoft.dcc.ufmg.br/jextract. 
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